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Theory  and  Applications  of  Neural  Networks 

Final  Technical  Report.  0-01-89  to  2-29-92. 

Co-Investigators;  R.  Brockett,  L.  Valiant,  R.  Westervelt  and  A.  Yuille 

This  report  is  organized  into  four  main  sections.  Each  section  giv*“s  a  summary  of  our  work 
performed  in  the  area.  Section  (2)  describes  dynamical  systems  for  analog  computation  with 
applications  to  optimization,  adaptive  filtering,  and  robust  coding  respecting  tiie  practical  con¬ 
striction  of  limited  dynamic  range  In  Section  (3)  we  analysed  discrete  neural  net  models  to  suggest 
practical  learning  algorithms.  Section  (4)  analyzes  the  convergence  and  stability  of  a  new  class  of 
clocked  neural  network  models.  Section  (5)  is  based  on  the  use  of  ideas  from  statisucai  pliy.sirs  in 
modelling,  unification  and  algorithm  generation. 

0.1  Dynamical  Systems  for  Analog  Computation 

One  of  the  main  ideas  underlying  the  interest  in  neural  computing  is  that  it  may  be  possible  to 
develop  new  computational  paradigms  that  will  make  important  aspects  of  progratiiining  both 
simple  and  more  robust.  The  means  for  doing  so  usually  involves  setting  up  some  “universal” 
difference  or  differential  equation  whose  trajectories  define  rules  for  solving  problern.s  m  curve 
fitting,  interpolation,  etc.  Our  work  [1-6)  has  addressed  the  use  of  analog  computation  methods 
for  optimization  2is  well  as  sorting,  quantizing,  etc.  Using  a  simple,  but  powerful,  mathematical 
model  we  have  shown,  how  basic  subsystems  can  provide  the  building  blocks  that  are  capalde  of 
accounting  for  the  operations  that  we  sec  being  performed  by  biological  and  digital  computers 
More  specifically,  in  our  papers  [1),[2)  we  have  shown  that  a  certain  class  of  gradient  flows  on  the 
n  dimensional  orthogonal  group  generates  effertive  mean.s  for  solving  a  variety  of  combinatorial 
and  linear  algebra  problems  of  the  type  that  shows  up  in  the  neural  network  literature.  A  key  idea 
here  is  that  of  an  adaptive  subspace  filter  -  a  general  model  for  nonlinear  filtering  of  llie  type  seen 
in  various  cognitive  applications.  This  model  not  only  allows  one  to  study  global  convergence  in  a 
precise  way,  but  it  allows  one  to  make  analytical  predictions  about  the  speed  of  convergence  which 
can  then  be  compared  with  the  performance  of  natural  systems  We  have  shown  [Ij  that  some  of 
our  earlier  analog  models  for  sorting  can  be  interpreted  as  conditional  density  propagators  This 
is  important  because  it  shows  that,  in  a  certain  probabilistic  sense,  these  are  the  best  algorithms 
for  doing  the  task  in  question  We  have  also  further  illuminated  the  connection  between  this  work 
and  the  Toda  lattice  equations  [4], [5], (6)  and  in  that  way  have  been  able  to  provide  the  explicit 
expressions  for  trajectories  associated  with  some  systems  of  this  type. 
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0.2  Learning  Theory  and  Discrete  Neural  Nets 

We  have  been  fortunate  to  have  had  two  excellent  postdoctoral  fellows,  Nick  Little^lone  and  Robert 
Schapire.  They  are  among  the  leading  contributors  to  computational  learning  theory  Their  work 
on  this  grant  is  described  in  the  six  publications  below  Each  paper  addresses  in  some  way  the 
problem  of  how  a  function  ran  be  learned  from  seeing  examples  and  counlerexanijdes  of  it  using 
only  feasible  computational  resources. 

A  brief  summary  of  some  of  these  papers  is  as  follows:  (i)  Tuccessful  studie.s  of  the  learning 
curves  of  neural  nets  exist  in  two  very  different  frameworks.  One  is  the  Vapnik-Cheronenkis 
dimension  from  statistics,  and  the  other  is  the  language  of  statistical  physics  In  reference  [1] 
a  very  elegant  unification  of  these  two  approaches  is  presented,  (li)  A  few  years  ago  Littlestone 
discovered  a  learning  algorithm  for  perceplrons  that  is  as  simple  as  the  clas.sical  one.  but  can  be 
shown  to  outperform  it  greatly  in  the  case  that  a  large  number  of  the  dimensions  are  irrelevant,  in 
most  realistic  settings  of  learning  this  is  the  case  since  the  relevant  attributes  cannot  be  identified 
a  pnort  in  general.  In  reference  [2]  Littlestone  demonstrates  further  properties  of  his  algorithm, 
in  the  important  case  that  some  errors  in  the  data  have  to  be  allowed  for  (lii)  On  the  subject  of 
coping  with  large  numbers  of  attributes  that  are  irrelevant  but  not  identified  a.s  such,  in  reference 
[3]  the  first  general  transformations  are  presented  for  translating  any  one  from  a  wid*'  class  of 
learning  algorithms  to  one  that  is  attribute  efficient,  (iv)  In  (•I)  Schapire  gives  one  of  tiie  rno.st 
general  positive  analytic  results  known  for  learning  a  class  of  funclion.s.  Reference  [,S]  introduces 
novel  techniques  to  prove  that  certain  classes  that  are  not  learnable  in  the  gene;  a!  r:vv  at«-  learnable 
for  restricted  input  distributions. 
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0.3  Planar  Neural  Networks  for  Vision  and  Sound  Processing. 

We  conducted  a  systematic  study  of  the  stability  of  neural  networks  with  feedback,  including 
the  delays  which  occur  in  real  implementations  (1).  The  goal  of  an  architecture  which  could  be 
stably  implemented  in  hardware  led  us  to  develop  the  clocked  neural  network  [2,3]  in  which  all 
neurons  are  updated  synchronously  on  a  clock  pulse.  The  effects  of  clocking  are  analogous  to 
those  for  digital  computers:  synchronous  update  helps  to  stabilize  the  network  by  eliininatitig 
timing  ambiguities  caused  by  varying  signal  paths  or  neuron  delays  One  of  our  most  significant 
achievements  is  the  proof  of  a  global  stability  criterion  [2]  for  clocked  neural  networks  with  arbitrary 
symmetric  interconnections.  The  conditions  of  the  proof  are  sufficiently  general  that  the  results 
guarantee  stability  for  real  implementations  of  clocked  networks  in  hardware  for  example,  the 
neuron  transfer  characteristics  are  continuous  and  can  differ  from  neuron  to  neuron  We  have 
applied  this  approach  to  redesign  feedback  associative  memories  of  the  type  originally  considered 
by  Hopfield  as  clocked  neural  networks,  and  have  computed  “phase  diagrams"  which  specify  the 
stable  operating  region  in  terms  of  the  neuron  gain  and  the  number  of  stored  memories  [4  ,5)  We 
have  also  computed  the  number  of  small  spurious  attractors  which  can  prevent  the  state  of  th'- 
system  from  reaching  a  stored  memory  [6,7,8).  The  sum  of  this  work  ronstitiite.s  .a  solution  to  the 
mathematical  problem  of  stability  in  many  types  of  feedback  neural  networks  with  a  wide  variety 
of  interconnection  topologies  and  interconnection  strengths. 
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0.4  Statistical  Physics  for  Modelling,  Optimization  and  Unification 

Our  research  supported  by  this  grant  concentrated  on  five  areas  based  on  the  statistical  physics 
approach  to  neural  networks:  (i)  optimization  for  combinatorial  problems  using  deterministic  an¬ 
nealing,  (ii)  using  deformable  templates  for  high  energy  particle  detection,  (iii)  modelling  binocular 
stereo  visual  perception  and  relating  it  to  psychophysics,  (iv)  modelling  the  development  of  the 
visual  cortex,  and  (v)  enabling  these  statistical  physics  models  to  incorporate  techniques  used  in 
stati«-tic8. 

There  are  three  main  themes  in  this  work;  (a)  using  energy  functions  and  the  Gibbs  distribution 
to  model  problems  in  terms  of  finding  the  optimal  statistical  estimators,  (b)  using  deterministic 
annealing  to  obtain  the  estimators,  and  (c)  unification  by  using  techniques  from  statistical  physics 
to  show  relationships  between  different  theories.  These  themes  are  emphasized  in  a  short  review 
paper  [1], 

In  the  work  on  optimization  we  focussed  initially  on  the  assignment  problem  [2,3]  though  we 
are  currently  generalizing  this  work.  We  proved  that  two  dynamical  systems  using  deierininistic 
annealing  were  guaranteed  to  converge  to  the  optimal  solution  and  gave  bounds  on  the  convergence 
times  of  these  systems.  We  also  found  intriguing  relations  between  these  dynamical  systems 
and  more  traditional  approaches  to  these  problems  such  as  the  auction  algorithm,  interior  point 
methods  and  linear  programming  with  barrier  functions  [11]. 

We  applied  deformable  templates  in  a  statistical  physics  framework  for  the  detection  of  particles 
in  high  energy  physics  experiments  [4,5].  The  resulting  system,  using  deterininislic  annealing, 
performed  well  on  simulated  2D  data  given  to  us  by  the  LEP  lab  in  CERN  This  was  extended  to 
work  on  3D  data  [9]. 

The  work  on  stereo  [6]  proposed  a  general  formulation  of  stereo,  showed  that  we  could  use 
statistical  physics  techniques  to  relate  previous  theories  to  this  framework,  proposed  deterministic 
annealing  as  an  optimization  strategy,  and  finally  showed  that  the  resulting  theory  was  consistent 
with  a  number  of  psychophysical  experiments. 

By  embedding  models  of  self-organization  of  the  visual  cortex  within  a  statistical  physics 
framework  we  were  able  to  show  [7]  precise,  and  hitherto  unexpected,  relations  between  existing 
theories  of  ocularity  and  the  spatial  organization  of  orientation.  This  emphasized  the  importance 
of  the  optimization  criteria  over  the  mechanism  proposed  for  minimizing  it  Small  varialio'is  in 
the  criteria  can  lead  to  very  different  experimental  predictions. 

Other  work  [8]  investigated  a  toy  theory  of  stereo  transparency  perception,  showe  '.  that  the 
statistical  physics  approach  could  incorporate  techniques  from  robust  statistics,  arj  that  the 
Hough/Radon  transform  could  appear  as  a  special  case  in  the  zero  temperature  ii  nit  We  also 
[10]  suceeded  in  determining  the  phase  transitions  for  a  class  of  texture  synthes  •  models. 
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